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Abstract. Edge detection is the process that attempts to characterize the intensity changes 
in the image in terms of the physical processes that have originated them. A critical, 
intermediate goal of edge detection is the detection and characterization of significant 
intensity changes. This paper discusses this part of the edge detection problem. To 
characterize the types of intensity changes derivatives of different types, and possibly 
different scales, are needed. Thus, we consider this part of edge detection as a problem in 
numerical differentiation. 

We show that numerical differentiation of images is an ill-posed problem in the sense 
of Hadamard. Differentiation needs to be regularized by a regularizing filtering operation 
before differentiation. This shows that this part of edge detection consists of two steps, a 
filtering step and a differentiation step. Following this perspective, the paper discusses in 
detail the following theoretical aspects of edge detection: 

(1) The properties of different types of filters—with minima! uncertainty, with a bandpass 
spectrum, and with limited support—are derived. Minimal uncertainty filters optimize a 
tradeoff between computational efficiency and regularizing properties. 

(2) Relationships among several 2-D differential operators are established. In particular, we 
characterize the relation between the Laplacian and the second directional derivative along 
the gradient. Zero-crossings of the Laplacian are not the only features computed in early 
vision. 

(3) Geometrical and topological properties of the zero crossings of differential operators 
are studied in terms of transversality and Morse theory. 

We discuss recent results on the behavior and the information content of zero crossings 
obtained with filters of different sizes. These results imply a specific order in the sequence 
of filtering and differentiation operations. Topological properties are preserved by level- 
crossings. Setting a level in the optimal filtering stage is a threshold operation —- which 
can be implemented in an adaptive way — that preserves all the “nice" geometrical and 
topological properties of zero crossings. 

Finally, some of the existing local edge detector schemes are briefly outlined in the 
perspective of our theoretical results. 
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1. INTRODUCTION 


Vision begins with the transformation of a flux of photons into a set of intensity values at 
an array of sensors. The first step in visual information processing is to obtain a compact 
description of the raw intensity values. The primitive elements of the initial description 
should ideally be complete in the sense of representing the full information contained in 
the image, and meaningful (that is, capturing significant properties of the three-dimensional 
surfaces around the viewer). Physical edges are one of the most important properties of 
objects since they correspond to object boundaries or to changes in surface orientation 
or material properties (Ballard and Brown, 1982; Binford, 198T 1982; Brady, 1981; Canny, 
1983; Davis, 1975; Hildreth, 1980; Marr and Hildreth, 1980; Pavlidis, 1977: Rosenfeld and 
Kak, 1976). 

Three-dimensional edges are often mapped by the imaging process into critical points of 
the two-dimensional intensity profile formed in the eye or in a camera. The ultimate goal 
of edge detection is the characterization of intensity changes in the image in terms of 
the physical processes that originated them. For instance, a shadow may be distinguished 
from an occluding boundary and material properties may be identified from the associated 
intensity changes. 1 A traditional belief in computational vision—that we fully share—is that 
this goal cannot be reached in a single step. At least two separate stages are required. 
First, one needs to characterize the intensity changes in the image. Second, one uses 
this representation, combined with high-level knowledge, to make assertions about the 3-D 
surfaces and their properties. 

The first part of edge detection then, requires the evaluation of derivatives of the image 
intensity. To characterize the types of intensity changes, derivatives of different type and 
order may be needed, possibly at different scales. The first part of edge detection is thus 
a problem in numerical differentiation. In this paper, we will consider only this first stage 
of edge detection as the process that attempts to detect, localize and characterize local 
edges , the sharp changes in intensity that are natural primitives for later processing. We 
will not consider here the second stage of edge detection that includes processes such 
as boundary detection, segmentation, region growing and groupings of local edges (that 
group local edge elements into structures better suited for the interpretation of image data 
in terms of the underlying physical processes). 

In this paper, we begin by analyzing the problem of differentiating a sampled image. We 
show that differentiation is an ill-posed problem (in the sense of Hadamard). Well-posedness 
and numerical stability of the differentiational step requires the regularization of the image 
intensities by a regularizing filtering operation preceding differentiation. This argument 
represents a novel and rigorous justification of the basic sequence of filtering and 
differentiation that can be recognized in all existing local edge detector schemes. We then 
examine in detail the filtering and the differentiation stage. We continue our analysis by 
characterizing properties of the critical points of the differentiation operation. 

Our main practical conclusions in this paper are (a) that Gaussian filtering, although not 
optimal under all conditions, is near-optimal, and computationally convenient; (b) the choice 
between rotationally invariant operators (rotational filters and rotational invariant differential 
RID operators, as the Laplacian or the second derivative in the direction of the gradient) 
or directional (directional filter and directional differential DD operators) operators (such 
as directional derivatives) depends on the subsequent information processing task. RID 
operators ensure closed edge contours, that are not provided in general by DD operators. 

We now outline the organization of this paper in more detail. 

'The use of color information—which we will not discuss in this paper—is a natural extension 
within this framewoik. 
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1.1. Organization of the paper 


In this paper, we consider edge detection as the process of computing derivatives in the 
two-dimensional intensity image. In Section 2, we show that the problem of differentiation 
of a sampled image is ill-posed. We prove that filtering of the image prior to differentiation 
is necessary for regularizing the problem and make it well-posed. The filtering step is 
analyzed in Section 3. Filters with minimal uncertainty (Hermite and Gabor functions), with 
bandpass properties (sine and prolate functions) and others that are support-limited are 
reviewed. Filter with minimal uncertainty tend to optimize the trade off between band-limited 
characteristics (required for a correct sampling and for “regularizing" the differential 
operation) and computational efficiency. 

Section 4 is devoted to the differential stage. We consider separately the second order RiD 
and DD operators and analyze their main properties. The main focus is on the localization 
of the zeros of the Laplacian V 2 , the second derivative along the gradient and the 
usual second order partial derivatives. Section 5 considers the geometrical structure of 
the contours formed by edge detectors and in particular their closure property. For this 
purpose, we use elementary tools from Morse and Thom theories. The problem of the 
geometry of contours across different spatial scales—where scale is parameterized by the 
size of the filter—is considered in Section 6. A comparison of the results of our study with 
several previously proposed edge detectors is given in Section 7, and a discussion of the 
“best" filtering and differential steps is given in the final section. 


2. COMPUTING DERIVATIVES OF IMAGES 

In this chapter, we consider the problem of computing (spatial or temporal) derivatives 
of sampled intensity images. Our main result is a rigorous justification of filtering before 
differentiation in terms of the theory of regularization. Our approach also clarifies the issue 
of the optimal filter for edge detection. In practice, it justifies the use of suitable derivatives 
of gaussian-like filters in edge detection (for linear differential operators). 

In the first section, we discuss the ill-posed nature of differentiation, which is equivalent to 
its lack of robustness against noise in the input data. In section 2.2, we review the main 
techniques for transforming differentiation into a well-posed problem. Section 2.3 shows 
that numerical differentiation can be regularized via previous convolution of the image data 
with an appropriate filter. In section 2.4, we consider the application of two of the general 
regularization techniques, and show that they lead to spline interpolation and to spline 
approximation respectively (prior to the differentiation stage). In these methods, regularized 
differentiation is thus performed by convolving the data with an appropriate derivative of 
the regularizing filter. In some situations, however, it may be more convenient in practice 
to first filter the data and then differentiate the results. We consider some implications 
of this situation in Appendix 1. The problem of sampling appropriately the image prior to 
filtering and differentiation is discussed in Appendix 2. interpolation, approximation and 
differentiation are discussed in Appendix 4. 

2.1. Ill-posed nature of differentiation 

In machine vision, as well as in most numerical problems, the data are noisy. Noise in 
the phototransduction process is ultimately unavoidable. Sensor noise arises at least in 
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part from quantum fluctuations in the number of absorbed photons per sensor and unit 
time. This represents a fundamental limitation for real time imagery when integration time 
and size of the sensors are limited by the need of high temporal and spatial resolution. 
It is critically important, therefore, that the results of numerical operations performed on 
the data are not too sensitive to noise. It is well known that differentiation is not robust 
against noise. Even a small amount of noise may disrupt differentiation. Let us consider 
a function f(x) and f(x) — f(x) + ( sin wx. f(x) may be close to f(x) according to standard 
norms (//•, L x ...), provided r. is sufficiently small. On the other hand, f'(x) may be quite 
different from f (x) if w is large. 

In the beginning of this century, Hadamard (1923) defined a mathematical problem to be 
well-posed if its solution 

(a) exists 

(b) is unique 

(c) depends continuously on the initial data (this is equivalent to saying that the solution is 
robust against noise). 

Most of the problems of classical physics are well-posed in this sense, and Hadamard 
argued that meaningful physical problems had to be well-posed. 

Now differentiation of the function f(x) is a typical ill-posed problem, since it can be seen 
as the solution to the inverse problem 


g(x) = Af(x) 


[ 2 . 1 ] 


where Af(x) is the integral operator 

/ oe 

k(x — x)f(x)dx [2.2] 

-OO 

where h is the step function. It is well known that inverse linear problems in which g(x) and 
f(x) belong to Hilbert space are ill-posed (Tikhonov and Arsenin, 1977; Bertero, 1981). 

2.2. Regularization techniques 

Rigorous methods for transforming ill-posed problems into well-posed problems have been 
developed over the past years (see especially Tikhonov, 1963; Tikhonov and Arsenin, 1977; 
and Nashed, 1974, 1976). Regularization of the ill-posed problem of finding z from the data 
;</, such that Az --= y requires the choice of suitable norms || ||, (usually quadratic) and of 
a stabilizing functional ||/’z||. The choice of the stabilizing functional and of the norms is 
dictated by mathematical considerations, and most critically, by an analysis of the physical 
constraints on the problem. There are three main methods of standard regularization 
(Bertero, 1981): 

(1) Among z that satisfy ||/'a|| < C t (where (7, is a constant) find z that minimizes 



\\Az - 2 /|| [2.3] 

(2) Among z that satisfy ||/1z-~ ?/|| < C-> find z that minimizes 

II/’Hi [2.4] 


and (3) Find z that minimizes 
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where X is a regularization parameter. 

The first method consists of finding the function 0 that satisfies the constraint \\Pz\\ < C u 
and best approximates the data. The second method computes the function z that is 
sufficiently close to the data and is most “regular". In the third method, the regularization 
parameter X controls the compromise between the degree of regularization of the solution 
and its closeness to the data. 

Differentiation can also be regularized using the stabilizing operators introduced by Tikhonov 
(Tikhonov and Arsenin, 1977; Bertero, 1981). In the case of differentiation these operators 
are equivalent to filtering the data with low-pass filters of the kind we will discuss in chapter 
3. 

In the next section, we show how to use method 2 and 3 directly for solving the ill-posed 
problem of numerical differentiation. In section 2.4, we will consider a wide class of 
regularizing filters that correspond to Tikhonov stabilizing operators and can be used to 
make numerical differentiation well-posed. 

2.3. Regularizing differentiation with interpolating and approximating splines 

Poggio, Voorhees and Yuille (1984) have recently applied the second and the third 
regularizing methods to the problem of edge detection. Following Schoenberg (1964) and 
Reinsch (1962), they chose for /’ the simplest form of Tikhonov’s stabilizing functionals 
with V — -£ 7 and the usual L> norm. This choice corresponds to an “a priori” constraint 
of smoothness on the intensity function. Its physical justification is that the noiseless image 
has to be smooth in the sense that all its derivatives must exist and be bounded because the 
image is band-limited by the optics. Physically, this constraint of smoothness allows us to 
eliminate effectively the noise that creeps in after or during the sampling and transduction 
process, and makes the operation of differentiation unstable and ill-posed. This is, of course, 
not the only stabilizing functional for this problem, as we will see in the next section, but 
it is probably the simplest one. 

Let us now consider in more detail for the second and third regularization methods. Consider 
a function f(x) defined in [a, X] and be A = a < x a < x x ...x n — !> a mesh of distinct points, 
and 


I, = J(x k ) [ 2 . 6 ] 

the values of f{x) at x k . Given the sample points of f k , the problem of computing the 
numerical derivative f' k at x k is ill-posed. The second regularizing method leads (usig the 
stabilizing operator P — and the /, 2 norm) to the search of a function S(x) such that 
(a) 




S{x *) = /* * = [2.7] 

and (b) ||/\S’(:r)|| is minimized. The stabilizing functional P is 

fa-8] 

J a 

The solution to this problem is given by the cubic spline Ss[x) which interpolates /(.r) in 
A (Ahlberg, Nilson & Walsh, 1967). As a consequence, the numerical derivative will be 
the value of S' A (x) in x k . For equidistant points the following equation holds 

S\ {'»(!)(/* h - fk ,)■ « s (!)(/,f M 3 (l)(/* t3 -A. ;>)■••}, [2.«] 


5 
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where h is the sampling period, and 



that is 


[2.9] 


f'k = ^[.8W(/ t+ i - fk-i) - .215(A +2 - /*_•>) + .0577(A + 3 - / fc _ s )...] [2.10] 

Poggio et al. (1984) have obtained the following theorem which is a reformulation of results 
due to Schoenberg (1946, 1964): 

ThearerrcThe cubic spline interpolating the data points assumed on a regular lattice and 
satisfying the second regularizing method with /' = ~ can be obtained by convolving the 
data points with the cubic spline filter, which corresponds to the /;' function of Schoeberg 
(1946). 

Numerical differentiation, therefore, can be regularized for exact data on a regular grid by 
convolving the data points with the first derivative of the /,' filter given by Schoenberg, 
which is a cubic spline. 

In the case of non-exact data which is the most natural situation, the third regularizing 
method has to be used leading to the problem of finding S(x) such that 

(Ik - S (**)) 2 + X 

k~ I 

is minimum. Both Schoenberg (1964) and Reinsch (1967) proved that approximating cubic 
splines are the solution to this variational problem. Poggio et ai. (1984) have proved the 
following result: 

Theorem -.The solution to the variational problem [2.11] in the case of inexact data on a 
regular grid (and appropriate boundary conditions), can be obtained (a) by convolving the 
data with a filter, (b) which is a cubic spline, and (c) which is very similar to a gaussian. 

This implies that regularized differentiation of image data can be performed by convolving 
the data with the first derivative of a cubic spline filter, which is very close to the gaussian, 
as shown in figure 1. 



/ 


S”(x)fdx 


[ 2 . 11 ] 


This result probably is the simplest and most rigorous proof that a gaussian-like filter 
represents the correct operation to be performed before differentiation for edge detection. 
We refer to the paper by Poggio et al. (1984) for a detailed proof of this result and for a 
comparison between the optimal filter and the gaussian. Poggio et al. (1984) also analyze 
the role of the regularizing parameter X, its connection to the optimal scale of the filter, 
and discuss methods for finding the optimal X. 

2.4. Regularizing filters 

In the previous section we have seen that differentiation can be regarded as the inverse 
problem of the integral equation 

<j(x) ----- f f(x)dx [ 2 . 12 ] 

J ... 

where f(x) must be recovered from the knowledge of the data <>(.c). which is usually 
given only on a discrete lattice. This problem is ill-posed, and can be regularized by the 
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a) 


R Filter 
Gaussian Filter 



b) 



Effect of the smoothing parameter X on R 1 


Figure 1 a) The convolution filter obtained by regularizing the ill-posed problem of edge 
detection with method (III) (see Poggio et at., 1984). It is a cubic spline (solid line), very 
similar to a gaussian (dotted line), b) The first derivative of the filter for different values of 
the regularizing parameter \, which effectively controls the scale of the filter (from Poggio 
and Torre, 1984). 


regularizing methods previously mentioned. Furthermore, Tikhonov and Arsenin (1977, see 
also Bertero, 1981) have proved that it is in general possible to regularize inverse problems 
by using Tikhonov’s stabilizing operators. For equations of the convolution type as equation 
[2.12], the stabilizing operators correspond to convolving </(x) with a filter F(x,a), (where 
a > 0 is a parameter) whose Fourier transform F(uj, a) satisfies the following conditions: 

(Cl) T (w, ft) is bounded for <x ^ 0 and all uj, 

(C2) ) is an even function with respect to lj, and it belongs to L 2 (-oo, +°o). 

(C3) F(bj,a)ju> belongs to /, 2 .(—oo,+oo). 

(C4) For every a > 0 it holds li-ni| u ,| 1 _ 00 F(w ,«) = 0. 

(C5) F(u,a) h-> l as a 0 and /~’(u>,0) — 1. 

This regularizing filter is equivalent to a smooth low pass filter. In the next chapter, we will 
discuss three different classes of low pass filters that have been used for edge detection. 
The first two of them fully satisfy the previous conditions (C1-C5), and are therefore 
regularizing filters in Tikhonov’s sense. As a final remark, it is interesting to notice that 
this regularizing filters usually correspond to the solution of variational principles of the 
type provided by the third regularization method with an appropriate stabilizer /’ (compare 
Tikhonov and Arsenin, 1977, page 121). 


3. FILTERING 

In this section, we will make some preliminary observations on filtering and then, we will 
review three kinds of low pass filters, which have been used in machine vision for edge 
detection. We will consider bandpass filter in section 3.1, support-limited filters in section 
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3.2 and minimal uncertainty filters in section 3.3. Our conclusion is that bandpass filter as 
well as minimal uncertainty filters are good regularizing operators for differentiation in the 
sense of Tikhonov, while support-limited filters are only marginally useful. 

As in the study of functions in analysis, many properties of intensity changes can be 
characterized in terms of zeros of appropriate derivatives. For instance, one-dimensional 
step edges in intensity correspond to extrema of the first derivative, whereas roof edges 
correspond to zeros in the first derivative. The main goal of the filtering and differentiation 
stage in edge detection is to produce a representation of zeros and extrema. Interestingly, 
the type of derivative — whether directional or rotationally invariant — and the type of 
representation — whether zeros or extrema — dictate some general properties of the filter 
to be used. We will now briefly discuss these two points. 

The first point is obvious: directional derivatives require one-dimensional filters properly 
oriented along the chosen direction; when rotationally invariant operators are used, the 
filter / is a function of the radial coordinate p. 

We restrict ourselves to examine linear, space invariant filters. Since isotropy can be 
assumed, the shape of filters, when viewed one-dimensionally, is an even or odd function. 
Let us now consider the implications of this for the case of step intensity edges. Because 
of the arguments developed in the previous section, we detect intensity edges from the 
zeros of a suitable derivative of the filtered intensity profile (i.e. its critical points). If the 
shape of the step edge to be detected is S(x), defined as 



then the output g(x) of the convolution f(x) * S(x) where f{x) is the filter, will be 

(j(x) = F(x) - F{-oo), [3.1] 

with F(x) the integral primitive of f(x). Therefore: 

•The extrema of </{x) correspond to the zeros of f[x). 

•The zero-crossings of correspond to the extrema of f(x). 

Three consequences can be derived from these observations: 

1. If we are interested in the extrema of the output g(x), and if we want to have an extremum 
located at the position of the edge, then f(x) must be an odd function. 

2. If we are interested in the zero-crossings of g(x), and if we want to have a zero-crossing 
located at the position of the edge, then f(x) must be an even function. 

3. If we are interested in the extrema or zero-crossings, and if f(x) has many zero-crossings, 
we will have many secondary extrema or zero-crossings. To avoid false edge detection, 
/(:r) should have the least number of zero-crossings, and the optima! situation would then 
be: 

• If f(x) is odd, then f(x) has only one zero. 

• If f(x) is even, then /(.r) has no zero. 

3.1. Band-limited filters. 

Band-limited filters are an obvious choice for regularizing differentiation, since the simplest 
way to avoid harmful noise is to filter out high frequencies that are amplified by differentiation. 
Linear and circular prolate functions constitute an especially interesting class of band-limited 
filters (Frieclen, 1971); Landau and Pollack, 1961). Linear prolate functions </’„(.< ) are defined 
by the relation 
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r-N, 



t n 27 rX T , 

TT 


, , /wx 0 \ 

M~irb 


3.2 


where X„ are called “linear prolate eigenvalues". From [3.2], we see that V«(x) depends 
on two parameters, x„ and 0, whose significance will be seen later. The value of X„ is a 
function of r — x„n and may be written as \ n (x,c). ip n (x) depends on The main properties 
of 4’„(x) are 

(1) tl> n (x) are band-limited, 

(2) i]> n (x ) are orthogonal on both the interval [—ar 0 , ^o] and [-oo,+oo], with 


ij>, l (x.)4’ rn (x)dx — X r! 8 nrn 

[3.3] 

t n(x}< lx 6 n m 

(3) ij’ n {x) form a complete set of functions of the space of band-limited functions whose 
Fourier transform F(u) is F(u>) = 0 for |w| > H. 

In the defining expression [3.2] of V>„(x), there are three constants, ft, X n and x 0 . From 
[3.3], we see that 0 is the cutoff frequency and from [3.3]. x„ is half the length of the 
finite interval over which linear prolate functions $ n {x) are orthogonal. From [3.3], we also 
see that X„ represents the fraction of energy of W,(x) within |x| < x„. The dependence of 
X„ on c. = x„n is shown in Figure 4.3 of Frieden (1971). Therefore, once we have chosen 
f], we can find c, and consequently x 0 such that the energy of i> n {x) is almost completely 
contained in |x| < x„. 

Linear prolate functions have the nice property that the band-limited function with cutoff 
frequency 0 and maximal energy concentrated in [-x 0 ,x„] is ij>„{x) with c= Hx„. Similarly, 
the odd band-limited function with cutoff frequency H that has maximal energy concentrated 
in [-x„,x„] is V’i(-'r) with c — il ■ x„. Linear prolate functions are also useful for solving the 
inverse problem; that is, the strictly support-limited function in \-x„,x u ] that has maximally 
concentrated frequencies in — [SI, n] is 



/ — Ib>x„1 K{x, c) C — x 0 • n, [3.1] 

where D Xo is the operator defined as 



These results show clearly the difference between ij> „(x) and sinc(x). They are both 
band-limited, but i/>„{x) falls off more rapidly than sinc(x) (see Fig. 2 of Landau and Pollack 
1961). On the other hand, the strictly support-limited function, which has the minimal spread 
of frequencies, is not a Haar function or a Difference Of Boxes filter (see later) but is 

0 y-„(x). 

Oscillations in the filter may produce ringing phenomena in the edge detection process. 
To reduce these phenomena, it is necessary to have maximal energy of (or </t (x)) 
concentrated in the main lobe. With a value of <■ equal to 7, more than 99 per cent of the 
energy of and Vi(-') is concentrated in [~x ( ,,x„]. 

It is immediate to verify that bandlimited filters satisfy all conditions (see section 3) of 
Tikhonov in order to regularize differentiation. 

If we are interested in rotationally invariant two-dimensional filters that are band-limited, 
we can simply take even linear prolate functions n 0,2,-1... and substitute x with 


0 
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\/x 2 ■+ y 2 — p. Now i/> n {p) is a band-limited function, but does not have the two-dimensional 
analog of properties (1)- (3). These properties are satisfied by the circular prolate functions 
defined by relation: 

J 0 J„{^p)^n{f>)l>dp ~ ')” ([ [3.6] 

where ./„ is the Bessel function of order zero. 

3.2. Support-limited filters 

All real filters have a finite extension and are support-limited. Computational efficiency 
requires that the support of a filter is as compact as possible. Therefore it is interesting to 
investigate the properties of filters with strictly limited support. The simplest even filter with 
a strictly limited support and with unitary energy is 


f( ] (^A/V2D M < I) 

Aj l = o \x\ > ir 


whose Fourier transform F{u>) is 



sin u>D 
ui 


In two dimensions, we have 



whose Fourier transform is 


F(u) = 


yfH w 


[3.7] 


[3.8] 


This kind of filtering represents the well-known “blurring" of the image through a circular 
aperture of radius p a - 

It is important to observe that this class of support-limited filters fails to satisfy, in a strict 
sense, the five conditions of section 2.4. In particular, condition (3) (F{u,a)ju> belongs to 
// 2 ( -oc,+oo) is not satisfied (because differentiation introduces back high frequencies in 
the same amount as they are removed by this type of filtering). Thus, support limited filters 
are not good regularizing filters in the sense of Tikhonov. Nonetheless, this class of filters 
can be still considered as regularizing operators in a weak sense. 

If we are interested in odd filters, the simplest support-limited filter is 



0 

. .t _ 

\/-j« 

_ i 

ViD 


1*1 > D 

0 < ,r < I) 

I) < x < 0 


whose Fourier transform is 


[3.8«] 


n») - . 


I) 


(I 


>»)■ 


[3.86] 
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This filter has already been proposed by Herskovitz and Binford (1970) and is commonly 
called DOB (Difference of Boxes). It is also a Haar function (see Fig. 19 of Harmuth 1972 
and page 399 of Kolmogorov and Fomine, 1974). The system of Haar functions is complete 
and constitutes a basis for all square integrable functions on a bounded interval. This 
property may have some relevance in the context of image processing with Haar functions. 
Support-limited filters that are even functions can be easily extended in two dimensions 
by a simple rotation around the origin. A complete set of support-limited functions in two 
dimensions, which can be used as filters, is the Haar system with two variables (see 
Harmuth 1972). The Haar function of equation [3.8] has the nice property of being the 
optimal support-limited filter that maximizes the signal-to-noise ratio for an ideal step edge, 
S(x). It is easy to see that spatial spread of f{x) favors the signal-to-noise ratio, while 
spatial concentration favors localization, for instance of zero-crossings. This can be seen 
as another formulation of the uncertainty relation (Canny, 1983). 

3.3. Filters with minimal uncertainty 

In the two previous sections, we analyzed band-limited and support-limited filters. Band- 
limited filters have theoretically infinite support. A drawback of support-limited filters is that 
they are regularizing only in a weak sense. It is natural then to try to find an optimal 
compromise between these two types of filters. A measure of the spread of a function 
/ e A 2 (3?) in the space and frequency domain is the uncertainty All, defined as: 

AU — UX, [3.9] 

where 

y2 f-~(*-x) 2 f 2 (x)dx . 

x r +oc ru \ I 

J-o o fHx)dx 


X = 



[3.11] 




rr 




F(oj) is the Fourier transform of f(x) and 


u> — 


/ + oo 

Cl>|/'’(w)| 2 </u). 

~oo 


[3.12] 


[3.13] 


Notice that n- is proportional to the density of of zero-crossings for Gaussian white noise 

(Papoulis, 1962; Papoulis, 1965, p. 487). It is well-known that the Gaussian function 
is the real function / c. /F(')i) that minimizes the uncertainty A//. On these grounds it has 
been proposed by Marr & Hildreth (1980) as the optimal filter. The uncertainty of an even or 
an odd function / c : /r(')i') can be easily computed if its representation in terms of Hermite 
functions is known; that is, if we know the set of c n such that: 


4-oc 

/{■>■) - <■»¥>»(*), 
n 0 


[3-1 -1] 
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where 


<Pn(x) — e H n (x) [3.15] 

H n (x) is the Hermite polynomial of order n. The uncertainty of <p n {x) is simply n+ If }{x) 
is an even function, then 


-f-oo 

f[ x ) = ^2 C ‘^<P2k{x), 

k = 0 


[3.10] 


and the uncertainty At/ is given by 


At/ — \J A 2 - B 2 

„ ES(^ + I)4 

ll/ll 2 ___ [3-17] 

U _ ^> k=o C2k+2C,> k y/(2k + 2 )( 2 k + 1 ) 

ll/ll 2 


If f(x) is odd, 


+ OC 


f( x ) = c 2h+l<P2h+l( x ), 


h—Q 


[3.18] 


and the uncertainty A V is given by 

A V = \//F 


B 2 


A = 


B- 


E,I-%(2/r + 1 + , 


EtZ c ~h+:i c 2h +1 \/W> + 3)(2/» + 2) 


ll/ll 2 


[3.19] 


Equations [3.16]—[3.19] follow from properties of Hermite functions. We can easily see that 
the uncertainty of Hermite functions <p„(x) increases with n as the number of zero-crossings 
of <p„(.n) increases. From these observations, we see that good filters will be composed 
by Hermite functions with low n. From the point of view of uncertainty, the optimal even 

— i2 — xji 

filter is rf* 8 , and the optimal odd filter is (the two-dimensional case has been treated 
by Daugman, 1904a). Another class of functions with small uncertainty consists of Gabor 
functions 




< l , t>{ x ) 


V a* 


t ,y(27r/„ac4 <M 


[3.20] 


They are complex functions of a real variable and have uncertainty AT’ equal to \. However, 
the real and imaginary part of </■„(./ ) do not have minimal uncertainty. The only real function 
with uncertainty equal to ], is the Gaussian. 

Filters with minimal uncertainty, as well as bandlimited filters, satisfy the conditions of 
section 3 in order to regularize differentiation. 
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Figure 2 Comparison between the gaussian and a prolate function. See text. 


3.3.1. Relation between prolate and Hermite functions 

The essential difference between prolate and Hermite functions is that the former are 
band-limited and fall as while the latter fall off faster—somewhat too fast to be 
band-limited. It has been shown, however, that a crude approximation of <j>„{x), when c is 
large (see Slepian 1965) is 


ip n [x) ~ D n {xy/2c), [3.21] 

where D n is a Weber parabolic cylinder function. Now 

l) n (xV2c) — e~ n n (xy/c) =* i/’n(x), [3.22] 

where // n (.r) are the usual Hermite polynomials. This approximation fails for large rr, where 
falls off as j and P n (x) as a Gaussian function. However, when <• is larger than 7, 
then ij’„(.r) and ■</> ( (.<■) have more than 99 per cent of their energy in [ -x„,x a ], where the 
approximation [3.22] is satisfactory. In Fig. 1, we see the comparison between a Gaussian 

function (dotted line) with variance equal to and t„(x) (solid line) with r equal to 7. 
t/>„{x) has been computed according to the approximation described in Frieden (1971). 
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3.3.2, Gaussian filtering and the heat equation 

We consider here briefly an interesting analytic property of Gaussian filtering of images. 

Gaussian filtering, i.e. the convolution of the image I{x, y) (when l(x,y) is bounded and 
continuous) with the Gaussian, 


c jr' , [3.23] 

can be seen as a solution at an appropriate time t — ^ of the two dimensional heat 
equation 


with the initial condition: 


d 2 u (> 2 u du 
dx 2 dy 2 dt ’ 


u{x, y, 0) = 1(x, y). 


This is because the “source solution" of the heat equation (Widder, 1975) is 


[3.24] 


-i» 2 + » 2 ) 

k(x, y, t) — —— " , [3.26] 

\f\nt 

with time playing the role of the variance, that is, 

<7 2 = 2 1. [3.27] 

From Theorem 4.1 of Widder (1975) the solutions of the heat equation are entire functions 
of x and y. In other words the convolution of a continuous and bounded function with 
a Gaussian generates an entire function. This characterizes well the strong regularizing 
properties of the gaussian filter. 


4. Differentiation stage 

In this chapter, we will discuss the properties of some differential operators that have been 
proposed and used in edge detection. We first briefly consider directional derivatives in 
section 4.1. In section 4.2 we discuss properties of two second-order rotationally invariant 
differential operators: the Laplacian and the second derivative along the direction of the 
gradient. We stress here that it is unlikely that zero-crossings of one differential operator 
— such as the Laplacian — are sufficient for early vision. 

The many two-dimensional differential operators that can be used for detecting sharp changes 
in intensity can be classified according to whether they are (a) linear or nonlinear, and (b) 
directional or rotationally symmetric. In this paper, we use the (somewhat inappropriate) 
terminology of zeros of a differential operator Pf (f defined in l' c !)?-) in the sense of the 
locus of points of 1 such that Pf 0. This notion is different from the usual definition of 
the kernel of an operator P, that is, the set of function / such that Pf 0 in V. 
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4.1. Directional differential operators (DD) 

The directional differential DD operators used in edge detection are the usual directional 
derivatives. The use of directional operators has been criticized (Hildreth, 1930) on the 
grounds that such operators lead to smearing of zero-crossing contours (see Fig. 11 of 
Hildreth 1980). In that case the vertical operator was implemented with an operator of n x m 
pixels. Smoothing was performed in both the orthogonal and the parallel direction to the 
filter's orientation. A correct implementation of a vertical derivative however consists of an 
operator of I x m pixels. The smearing observed by Hildreth (1980), however, is not due 
to the use of a directional operator but to the distortion introduced by a too large width of 
the operator. The concomitant use of several directional derivatives has been proposed by 
several authors (Binford 1982; Canny 1983). Since in ft 2 the directional derivative in any 
arbitrary direction can be expressed in terms of £ and £, it is evident that in a noise-free 
image the use of more than two-directional derivatives is of no help at all. In a noisy image 
the use of several directional derivatives may be useful for increasing the signal-to-noise 
ratio. 

We will see in a later paper that the use of just two narrow directional derivatives is sufficient 
to detect all edges detected by rotationally invariant differential operators or by a large set 
of directional derivatives. 

4.2. Rotational invariant differential operators (RID) 

Rotationally symmetric operators have several attractive features. Two of the most widely 
used operators of this class are the Laplacian (V 2 , which is linear) and the second 
directional derivative along the gradient (f^) which is nonlinear. We will derive in this 
section several properties of the two derivatives and especially of their zero-crossings. In 
particular, we derive a necessary and sufficient conditions on image intensities for the 
zero-crossings of the two derivatives to coincide. 


4.2.1. Null space of the Laplacian and subharmonic functions 

Certain classes of functions do not originate zero-crossings in the Laplacian: they are 
harmonic and subharmonic functions. Harmonic functions are the null space of the Laplacian 
operator. Interestingly, they are invariant with respect to heat diffusion and therefore do not 
change under convolution with a gaussian of any size (Yuille, pers. comm.). This property, 
however, is not stable. Another non trivial result is that any non-linear function 4> of an 
harmonic function has zero-crossings at the locations of the inflection points of <f> (Yuille, 
Poggio and Ullman, pers. comm.). Harmonic functions are non-generic, in the sense that a 
small perturbation destroys the harmonic property. 

Subharmonic functions are such functions that the modulus of their Laplacian is everywhere 
positive (Daugman, 1984a). These functions are robust against small perturbations. 




4.2.2. Cartesian and polar form 


We just give the explicit representation of the two operators in cartesian and polar 
coordinates: 


V“/ -- f xx + j yy 


P;7 , I <lf 1 0-f 
Op 2 p hp * p~ ho' 2 


[4.1] 
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0" f f xfxi + 2 fxfyfxy + fyfyy , 2 ()f Of 0~ f 1 f 

777* = ll + ll == V' TTJ, 7)o 7J7To + 77 v 7)7)) TTo* 

_ 1 (<2l\lL) _I_ [/1 - 2] 

A* <h> V <>° ) V ^/') O t > 2 ,,, ; y 2 ^ ^\ 2' 

V'W + P'\ ae ) 

We also give the explicit representation for the second directional derivative in the direction 
orthogonal to the gradient: 


f ?V flf,,~2f X f y f ly + flfyy 
0»i ' fl + Py 


Remark: 

The representation in polar coordinates shows clearly that the two operators are rotationally 
symmetric, since their form does not change for a rotation of the coordinate system 0*. We 
can now state 

•Characteristic Property of Rotationally Symmetric Operators. A sufficient condition for 
an operator to be rotationally invariant is that 0 appears only as derivative in the polar 
representation of the operator. 


4.2.3. Simple properties of V 2 and 

Marr and Hildreth (1980) had attempted to prove that in most cases the zero-crossings 
of the Laplacian coincide with the intensity edges. Since zeros of the second directional 
derivative along the intensity gradient are the natural definition of intensity edges, we are 
able to give here a more rigorous characterization of the problem, in terms of four simple 
properties. 


(I) If the image f[x,y ) can be represented as a function of only one variable, i.e., }{x,y„), 
the two operators V 2 and 7^ are equivalent, i.e., ~f = V 2 /. 

As a consequence, for f(x,y„) the zeros of and of V 2 / coincide. 

Property I is similar but not identical to the “linear variation" result of Marr and Hildreth 
(1980), which states that if / changes at most linearly along the edge direction l, then 


(II) If j yy = f xy — 0 at I’, when ~4 = 0, the zeros of coincides with the zeros of V 2 /. 

The assumptions on the image are here stronger than the condition of linear variation of 
Marr and Hildreth (1980), but are equivalent to the assumptions of their theorem 1: locally 
around the zero-crossing, / has the form f(x, y) — ax + by + c. 


(Ill) If f(x,y) -- f(p) is rotationally symmetric, V 2 / and y’4 differ by the additive term l p 

For circularly symmetric functions, the zeros of V 2 / are farther apart than the zeros of . 
This lack of localization by V 2 (for circularly symmetric patterns) can also be seen in the 
fact that zeros of V 2 (but not of ) “swing wide" of corners. 


(IV). (a) is nonlinear. 

(b) .'77 neither commutes nor associates with the convolution, i.e. 
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i)^ !i *^ {h s ) ‘ f 

[4.4] 


[4.5] 


(c ) ~ is a linear operator on f, if f f(p), but not shift invariant. 

(d) 77?e mean of ~ applied to a zero-mean function need not to be zero. 


4.2.4. Geometric characterization of the zeros of V 2 and 

It is interesting to consider under which conditions the zeros of the Laplacian coincide 
with the zeros of the second directional derivative along the gradient. Zeros of the second 
directional derivative along the gradient are a natural way of characterizing and localizing 
intensity edges. Zeros of the Laplacian, however, are extensively used for their computational 
convenience. In this section we derive rigorous results that clarify completely this set of 
questions. 

Let us consider the intensity surface represented as X = (x, y, s), where 2 = f(x, y) with 
/ G C T (D), I) 6 5i 2 and r > 2. 

The mean curvature of the surface X is 


where 


If 


EN + GL - 2 FM 
2<7 2 


0 + fl)fyy + (1 + fy)!xx - 2/xfyfxy 

—— 


[4.6] 


E — 1 + fl F “ fxf y G = 1 + fy, [4.7] 

are the coefficients of the first fundamental form l(dx,<ly) (Lipschutz, 1969, Pogorelov, 
1965), and 

A/ = ^ N — [4.7] 

9 9 9 

with (j l — 1 4 fl + / 2 , are the coefficients of the second fundamental form Il(<ix,dy). 

We use equations [4.2], [4.3] and [4.8] and the property 


v‘-7 = 

)/ 

V bn 1 dn\ ) 

jF_ . 

On 2 * 



[4.9] 




[4.10] 


We can now characterize the connection between the zeros of V 2 and the zeros of -—4/ : 

tin" * 


Property V: If V/ / 0, the zeros of -F., / coincide with the zeros of V 2 if! the mean curvature 
II is zero. 
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Thus, only for surfaces with minimal curvature (// — 0), the zeros of 40 coincide with 
the zeros of V 2 / where the gradient of / is different from zero. Note that (M. Kass, 
personal communication) V 2 / has the same zeros as 40 where the curvature of the lines 
of level-crossings of the intensity image is zero. Recently, Berzins (1984) analyzed in detail 
the behavior of zeros of the Laplacian of a Gaussian filtered image around corner edges 
and edges with high curvature. He showed that the zeros of the Laplacian are displaced 
from the true edge by less than a (the variance of the Gaussian filtering) when the radius 

of curvature is large compared to a, and when the distance to the nearest sharp corner is 

0 

large compared to - (where (-) is the angle of the corner in radians). Note that eq. [4.10] 
shows that the difference between §0 and V 2 / is small if the mean curvature II is small. 
Smoothing the image with a two-dimensional filter reduces the curvature (and more so 
for larger-sized filters). Therefore, we may expect that in filtered images, V 2 / will perform 
almost as well as §0. 


4.2.5. The normal curvature 

The second directional derivative along the gradient has a simple interpretation in terms of 
the normal curvature along the gradient. The normal curvature K„ in the direction of the 
gradient is (Lipschutz, 1969) 


Ldu 2 + 2M dudv + Ndv 2 . . 

n = Edu 2 + 2Fdudv + Gdv 2 J 

with du and dv as direction numbers. Setting du 2 + dv 2 = 1, the direction numbers along 
the gradient are 


du — 


fx 

|V/| 


[- 1 . 12 ] 


dv = 


fy 

i v/r 


[4.13] 


Thus, equations [4.11] and [4.13], together with equations [4.6] and [4.7], lead to 


K 


1 d 2 

<7 3 du 2 


[ 4 - 14 ] 


In particular, it follows 
Property VI 

The second directional derivative along the gradient and the normal curvature in the 
direction of the gradient have the same zeros when |V/| 0. 

Our geometrical characterization of the gradient and the second derivative along the 
gradient is completed by Appendix 3, that gives the geodesic curvature of the curve directed 
along the gradient. For surfaces of revolution the geodesic curvature of such lines is always 
zero. 

The operator fit. an d the normal curvature in the direction of the gradient l\ n are not 
defined when jV/| — 0. In this case, the direction of the gradient is underdetermined, 
although the Hessian can of course be diagonalized (determining the principal directions). 
Thus, 0 has the disadvantage with respect to V 2 that it is not defined everywhere. 
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4.2.6. Potential biological consequences 

A natural question arising from these comparisons is: which derivative operators are used 
by the human visual system? It is obvious from the earlier sections that several different 
derivatives possibly at different scales have to be used for efficient edge detection. It would 
be very strange if the human visual system would make use of only one differential operator. 
The important questions is therefore which operators or combinations thereof are used 
in different visual tasks and under different conditions. Zero-crossings in the output of 
directional second derivatives approximated by the difference of one-dimensional Gaussians 
(DOG) were suggested by Marr & Poggio (1977) in their theory of stereo matching. Marr & 
Hildreth (1980) later proposed the rotationally symmetric Laplacian V-<7 (approximated by 
a rotationally symmetric DOG) for edge detection and for stereo matching. Psychophysical 
evidence does not rule out either of these schemes. Physiology shows that a class of 
retinal ganglion cells performs a roughly linear operation quite similar to the convolution 
of the image with the Laplacian of a Gaussian. Data on cortical cells are still somewhat 
contradictory on whether some simple cells may perform the equivalent of a linear directional 
derivative operation, or instead, signal the presence of a zero-crossing of the rotationally 
symmetric V 2 C. 

On physiological grounds, it seems unlikely that retinal cells could perform the rotationally 
symmetric nonlinear ~ operation, although not all classes of ganglion cells have been 
tested properly to allow a firm conclusion. In particular, one-dimensional and rotationally 
symmetric patterns are customarily used as stimuli for physiological experiments. In the first 
case and V 2 are equivalent, whereas in the second case, they may be distinguishable 
only quantitatively. Let us now consider three classes of psychophysical experiments. 

(I) An interesting possibility for distinguishing the Laplacian from the directional second 
derivative on the basis of physiological or psychophysical experiments is suggested by the 
observation that the zero-crossings of the Laplacian “swing wide" of gray-level corners. In 
particular, the zero-crossings associated with an elongated black bar, for example, coincide 
for V 2 and whereas they differ in the case of a circular black disk. Hyperacuity 
experiments may allow one to distinguish the two cases. Notice that both operators are 

linear in this case. They associate therefore with Gaussian convolution (C — e^). The 
corresponding point-spread functions are 

(a) for the one-dimensional, f{x): 



(II) for the two-dimensional f(p) 



where a is the standard deviation of the Gaussian function. Let us call w the diameter of 
the central region of these masks, i.e., the distance between the central zeros. «>,/, denotes 
the diameter for the one-dimensional case and n>->» for the two-dimensional case. It is easy 
to see that the second directional derivative has w'( p — whereas this is not true for 
the Laplacian w\ n / From (a) and (b) we get 
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A possible psychophysical test is: 

• If zero-crossings in the Laplacian are used by our visual system to estimate position of 
edges, the apparent width of a narrow 1-D bar and of a small circle (with equal physical 
widths) should be different—the bar should appear smaller. This is not expected if the 
second directional derivative is used. 


(II) There are classes of intensity edges that generate zeros in ~~ but not in V 2 . An 
example is given by: 


l{x,y) = (1 + c^) "— H-IO] 

which, with appropriate values of ft does not satisy V 2 / = 0 for any y > 0. It is possible, 
however, to find solutions to -j'4 / = 0. Thus, the edge / could again be used to discriminate 

psychophysically between V 2 and . 

More in general, functions h € C 2 in a certain region D such that V 2 /t > 0 in D are called 
subharmonic, as we discussed earlier. These functions do not have zero-crossings of the 
Laplacian (Daugman, 1984a), but generally zero-rossings of are present. There are 
special cases, however, in which both and V 2 do not have zeros. An example is given by 
f(x) — COS X + bx~ with V 2 / = £~f — - cos 2 x + 2b, which does not have any zero-crossings 
if > b It would be interesting to test this kind of pattern both psychophysically and 
physiologically (controlling carefully for nonlinearities in the phototransduction). 

Ill) As we mentioned earlier in this chapter, harmonic functions cannot be characterized 
in terms of the zero-crossings of their Laplacian. Worse yet, any image is characterized 
uniquely by zero-crossings of the Laplacian (across gaussian scales, see chapter 6) 
modulus any harmonic function. Psychophysical experiments that measure the detectability 
of edges in subharmonic patterns are difficult to interpret, because they would give a 
clear answer only if the Laplacian were the only differential operator in the human visual 
system, a very unlikely possibility. Furthermore, harmonic functions are unstable against 
small perturbations, making difficult to control for non-linearities in the display and in the 
transduction process. 


5. Geometrical properties of edge contours 

In this section, we will discuss geometrical properties of edge contours obtained by different 
methods. We will show that edges derived through rotational operators are generally smooth, 
closed curves, while edges obtained with directional operators do not have such special 
geometrical properties. 

In many edge detection schemes, as we discussed in the Introduction, the image I(x, y) is 
first filtered and then a second order differential operator I) 2 is applied to the filtered image 
/(.r, y). Edges are identified in correspondence to the zero-crossings of l)- l(x,y). In other 
cases, edges are identified as extrema of some derivative of the filtered image. Again they 
may correspond to zero-crossings of a higher order derivative. In this way, the first part 
of edge detection provides a compact and possibly complete representation of intensity 
changes (see chapter 6). 

Therefore, it is important to analyze theoretically geometrical properties of the locus of 
points defined by 
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where l(x, y) is the filtered image and l)‘ l can be a RID or a DD operator. We first recall 
in the next two sections the notions of transversality (Abraham and Robbin, 1967, Poston 
and Stewart, 1976) and of Morse functions (Poston and Stewart, 1976). In section 5.4 we 
will classify the types of zero-crossings contours that can appear in images. 

5.1. Transversality and zero-crossing (z.c.) 

A curve (or a surface) .S’, meets a curve (or a surface) S 2 in P transversally when the 
tangent space TS t to S, in P and the tangent space TS-> to S 2 in /’ have locally around P 
an empty intersection. More generally, two subspaces U, V of 3t n are transverse if they meet 
in a subspace whose dimension is as small as possible. From this definition, it follows that 
the surface S f — {x,y,f(x,y)) meets the surface S 0 = (x,y,0) in P — (x,y,0) transversally iff 
in (x,y) 


|grad/| ^ 0. [5.2] 

The isotopy theorem (Thom, 1954) shows that transversal intersections are structurally 
stable. The converse is also true in that non-transversal intersections are structurally 
unstable. Transversality (and the implicit function theorem) indicates that if S f meets S 0 
transversally in P then the intersection of S } and S 0 around P is a smooth curve. 

The previous result is only local. Globally we find that if f(x,y) is defined in the compact 
domain V e H~ whose boundary is SV and if S f always meets S 0 transversally, then the 
intersection of S f and S 0 consists of: 

(a) smooth closed curves r c € V. 

(b) smooth curves F t that terminate in SV. 

In other words, transversality of zero-crossings means that zero-crossing contours are 
closed curves or curves that terminate at the boundary of the image. 

5.2. Closed and open contours of z.c. 

From the previous section a necessary and sufficient condition for transversality in P is: 

|grad/T 2 /(.T, y)\,> ^ 0. [5.3] 

A preliminary condition required by eq. [5.3] is that I)~1(x,y) is a differentiable function. 
This condition is obviously met if l(x,y) is analytic (or entire or band-limited). But we have 
already stressed that it is safer to suppose that the original image l(x, y) is a piece-wise 
continuous function or belongs to C n , with n not known a priori. If we filter the original 
image f(x,y) with an appropriate rotational filter, then /(x, y) is analytic both in x and ?/, 
and D~](x,y) = 0 defines a differential function. On the other hand, if we use a directional 
filter /, for example along x, we have 

H x > v ) = /(*, y ) * fi x ) (*'>••'0 

and there is no reason for l(x, y) to be a three times differentiable function of y. Therefore, 
if the original image has been filtered with a directional operator only, it is possible that 
the zero-crossings of P~ 'l(x, y) may not be smooth curves. 

5.3. Morse functions 

/■s 

A function f:P~ > > It is called a Morse function if at its critical point (i.e., points where 
grad/ 0), the Hessian is nondegenerate. Morse functions have the following properties: 


21 
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(a) Suppose that /(*,?)) = 0 and Igrad/I,, = 0 with /' = (i,y), but the Hess (/),-, is non 
degenerate. Thus, there is a smooth local change of coordinates around /’|grad(rr, y) i} \, 
such that / takes the exact form 


fix, Jl) 


irr-v, 


a- f 




(b) A small enough perturbation of a Morse function / can always be expressed in the 
same form as the original / by a change of coordinates and of scale. 

Property (a) says that around P the function / has the behavior of the quadratic form 
induced by the Hessian. Property (b) is a kind of structural stability property. A basic 
property of Morse functions is that they are dense so that, if / is a non-Morse function, then 
an arbitrary small perturbation of / makes / a Morse function (obviously the perturbation 
must not vanish at the critical points). This is the reason of the importance of Morse 
functions here: we can always assume that images are Morse functions (especially because 
of the unavoidable noise). 

5.4. Classification of z.c. 


We now analyze the geometrical properties of the z.c. contours, i.e., the locus of points 
defined by 


D 2 l(x, y) — h(x, y) = 0. [5.7] 

(a) If h(x,y) is not a smooth function of x and y (at least C"), the implicit function theorem 

f**\ cannot be used and the z.c. may be isolated points, i.e., segments of intersecting curves 

and 2-D regions. 

(b) If h(x,y) is a smooth function of x and y and if in P = (x,y) we have 

h(x, y) == o and |grad/t(i, y)\ f , ^ 0, 

then h(x,y) has in P a “transversal zero-crossing", which is a smooth curve. 

(c) If h[x,y) is a smooth function and in P we have 

h{x,y) = 0 and |grad/»(x, 2 /)p| = 0 [5.8] 


but around P, h(x, y) we find 

h(x, y) — ax' 2 + bxy + cy 2 + 0(x n y m ) n + m — 3, [5.9] 

where a — \} xx \ f >, b — c = \j vy \[,. The zero crossing P is: 

(i) an elliptic z.c., if Hess h{x,y)\ r > () (see Fig. 2A). 

(ii) a hyperbolic z.c., (saddle point) if Hess h{x, y)\ j ,< 0 (see Fig. 2B). 

(iii) a parabolic z.c., if Hess h{x, y )\ r = 0 but «, b and c are not identical to zero (see Fig. 
2C). 

(d) If h(x,y) is a smooth function and if in P we have 


h{x, y) 


0 [grad/(.T, ?/)]/,— 0, 


and in /', h(x, y) depends on the third order terms, 
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h(x, y) ~= ax 3 + fix 2 y + <jxy~ + 8y 3 + ()(x n y rn ), n + rn — 4, (5.10) 

where the coefficients a,p,j and 6 are obtained by the Taylor expansion. It is easy to see 
that the set of points 

Ha — {( x t y):ax 3 + fix 2 y + 7 x.y 2 + 8y 3 = ()} (5.11) 

are straight lines. The z.c. lines may be: 

(i) an elliptic umbilic, if lt A consists of three lines (see Fig. 2D). 

(ii) a hyperbolic umbilic, if fi A consists of a single real line (see Fig. 2E) 

(iii) a parabolic umbilic, if IIa consists of three lines, two of which are coicident (see Fig. 
2F) 

(iv) a symbolic umbilic, if Ra consists of three coincident lines. 

(e) If h(x, y) is a smooth function and in P we have 

h(x, y) = 0 

and in P, h(x, y) depends on the fourth order terms, the z.c. lines have a complex shape 
that can be analyzed using results of Poston & Stewart (1976). 


Bifurcations of zero-crossings 

The isotopy theorem (Thom, 1954, Abraham and Robbin, 1967) shows that transversal 
intersections are structurally stable, i. e. that “transversal zero-crossings" are structurally 
stable: their topological properties do not change if the size of the filter is slightly changed. 

If f{x,y) is a Morse function then S f may meet S u non-transverally, and these intersections 
are not structurally stable (observe that Morse functions are structurally stable but not their 
intersections with S„). If / is a Morse function, then S f may meet S„ non-transversally at 
elliptic points and hyperbolic points. These intersections are not structurally stable and may 
change their topological properties for small perturbations of /. More specifically we may 
have two bifurcations: 

(i) Elliptic z.c. At elliptic z.c., a small perturbation of / may lead to the disappearance of 
the z.c. or to the appearance of a contour of z.c. constituted by a closed curve. 

(ii) hyperbolic z.c. At hyperbolic z.c., which consists of the intersection of two curves any 
small perturbation leads to the breaking of the intersection of the two curves and the 
appearance of two disjoint curves. 

These are the two bifurcations that may appear when k(x, y) is a Morse function. Interestingly 
enough, the zero-crossing contours obtained with real images (which will be explored in a 
later paper) can be classified as type (b) and (c) of the previous section; Morse functions 
can have z.c. only of type (b) and (c). The two types of bifurcation, that may originate with 
Morse functions are illustrated in Fig. 3A and 3B, respectively (see also Koenderink and 
van Doom, 1979). Yuille and Poggio (1983a. 1983b) have shown that (if Gaussian filtering 
is used) when the scale of the filter is changed (i.e. a), the second type of bifurcation 
may appear either when <7 is increased or decreased, but the first type of bifurcation only 
/""'s occurs when a is increased. Thus, the Gaussian filter forbids creation of a zero-crossing 

contour from an elliptic z.c. for increasing a. It is important to note that all these topological 
properties are also valid for level-crossings. Thus setting a threshold in the output of the 
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Figure 3 The zero-crossing points may be of the elliptic (A), hyperbolic (B), parabolic 
(C) type; the zero-crossing lines can also be an elliptic umbilic (D), a hyperbolic umbilic 
(E) or a parabolic umbilic (F). See text. 

filtering and derivative operation preserves all topological and geometrical properties of 
zero-crossings. 

In summary, we have characterized the geometrical properties of zero-crossing contours: 
these properties — for instance the fact that zero-crossing contours are dosed — may be 
exploited in various ways in edge detection and even in stereo- or motion-matching. 
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elliptic z.c. 

/ 




Figure 4 The two types of bifurcations that can occur for increasing (left to right) and 
decreasing (right to left) a in the case of Morse functions. See text. 

6. EDGE CONTOURS AND FILTER SCALE 


As we have seen, differential operations on sampled images require the image to first be 
smoothed by filtering. The filtering operation introduces an arbitrary parameter - the scale 
of the filter, e.g., the standard deviation for the Gaussian filter. In computer vision, the 
advantages of using several scales of filtering was realized quite early on, and this was 
supported by evidence suggesting the presence of filters of several sizes in the human visual 
system (Rosenfeld, 1982; Marr, 1976; Marr and Poggio, 1979; Marr and Hildreth, 1980). More 
recently, Witkin (1983; see also Stansfield, 1980) introduced a scale-space description of 
zero-crossings which gives the position of the zero-crossing across a continuum of scales, 
i.e., sizes of the Gaussian filter (parametrized by the o of the Gaussian). The signal—or the 
result of applying to the signal a linear (differential) operator—is convolved with a Gaussian 
filter over a continuum of sizes of the filter. Zero- or level- crossings of the filtered signal 
are contours on the x n plane and surfaces in the x,y,n space. Witkin proposed that 
this concise map can be effectively used to obtain a rich and qualitative description of the 
signal. Yuille and Poggio (1983a, 1983b) — who called the maps of zero-crossings across 
scales fingerprints — have established interesting relationships between multiresolution 
analysis, the Gaussian filter and zero-crossings of filtered signals. Their main results are 
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two theorems: 

(a) zero- and level-crossings of an image filtered through a Gaussian filter have nice scaling 
properties, i.e., a simple behavior of zero-crossings across scales. Zero-crossings are not 
created as the scale increases. The Gaussian filter is the only filter that has this nice scaling 
behavior (see also Babaud, Witkin and Duda, 1983). 

(b) The map of the zero-crossings across scales determines the filtered signal uniquely 
for almost all signals in the absence of noise. The scale map obtained by Gaussian filters 
is thus a complete representation of the image. This result applies to level-crossings of 
any arbitrary linear differential operator of the Gaussian (modulus the null space of the 
differential operator and provided there are at least two zero-crossing contours), since it 
applies to functions that obey the diffusion equation. 

The first result sheds some light on the properties of zero-crossings and level-crossings 
at different scales with the Gaussian filter. It supports the use of the Gaussian filter in 
a multiresolution edge detection scheme. Reconstruction of the signal is, of course, not 
the goal of early signal processing. Symbolic primitives must be extracted from the signals 
and used for later processing. The second result implies that scale-space fingerprints are 
complete primitives, that capture the whole information in the signal and characterize it 
uniquely. Subsequent processes can therefore work on this more compact representation 
instead of the original signal (see Asada and Brady, 1984). 

The second theorem has theoretical interest in that it answers the question of what 
information is conveyed by the edges identified with zero- and level-crossings of multiscale 
Gaussian filtered signals. It is furthermore interesting that this complete representation 
happens to coincide with the basic scheme for edge detection discussed in this paper. 
From this point of view it can be argued that the fingerprint representation makes explicit 
exactly the information that is needed on physical grounds, i.e., it makes explicit edges in 
the image. 

It may be asked at this point what the right sequence is for the two steps of differentiation 
and filtering. For linear operators the order is of course immaterial, since they commute. 
It is not so for nonlinear operators, such as the directional derivative along the gradient. 
The regularization argument for the filtering step implies that filtering at one scale must 
precede the differentiation operation. The computation of different scales requires filtering 
at a range of resolutions after differentiation. The reason is that the theorems of Yuille and 
Poggio (1983a, 1983b) hold true even for the identity operator, but are not necessarily valid 
if filtering is performed before a nonlinear differential operation, in particular, Gaussian 
scaling after the nonlinear directional derivative along the gradient does not have a nice 
scaling behavior. Thus filtering as a regularizing operator must be performed first at one 
scale and filtering at different scales must be performed after the differential operation. For 
linear differential operators, this is equivalent to a multiscale filtering either before, after, or 
together with the differential operation (e.g. the Laplacian of the Gaussian). 


7. OVERVIEW OF SOME EDGE DETECTORS 

In this section, we will briefly compare our main conclusions with several edge detectors 
presented in the literature. Our review is neither intended to be exhaustive nor does it aim 
to present edge detectors in full detail. 

7.1. Difference of boxes (DOB) 

Binford and coworkers (Herskovitz and Binford, 1970; Horn, 1972; Binford, 1981) have 
suggested the use of support-limited filters in the filtering step of edge detection. They have 
used the Haar function [3.8b] in directional filtering or a difference of functions of the type 
[3.8a] for rotational filtering. There are two problems using this approach: 
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(1) Filtering with support-limited functions does not regularize the image intensity profile; 
therefore the use of any differential operator is unsafe. 

(2) A strictly support-limited filter, such as a DOB, cannot be correctly sampled, and it is 
very difficult to obtain a good digital representation. 

7.2. Shanmugam, Dickey and Green 

Shanmugam, Dickey and Green (1979) looked for a linear, band-limited operator that would 
yield maximal output energy within a given spatial interval in the vicinity of the edge. No 
explicit reference was made to a differentiation step in edge detection. They proposed that 
the optimal filter for an ideal edge S(x), has a Fourier transform 


(k x ^ x {^,c) |f!| < r 

\0 |fl|>r’ 


[7-t] 


where fc, is a constant, iIh{x,c) is a linear prolate function (see section 3.1). This edge 
detector performs very poorly on localization and has the intrinsic feature of giving two 
maxima of energy in the output of the response to an edge. The reason is simply that 
using an even filter such as eq. [7.1] which has the same shape of edges are 

located at the zero-crossing of the output and not at the extrema. Moreover, these authors 
use properties of linear prolate functions (their eq. 1) to derive their optimal filter which are 
valid in 1-D, but not in 2-D when linear prolate functions are extended in 2-D by rotation. 
In addition, their asymptotic approximation to the optimal filter was incorrect, as shown by 
Lunscher (1983). 


7.3. Marr and Hildreth 

Marr & Hildreth (1980), and Hildreth (1980), extending the work of Marr and Poggio (1979), 
have proposed an edge detection scheme based on a filtering step consisting of a 2-D 
symmetric Gaussian followed by the localization of zero-crossings of where /(*, y ) 

is the filtered image. This edge detector performs rather well, but its optimality was not 
rigorously proved. Indeed, 

(1) in many instances achieves a better localization than V 2 , particularly for rounded 
edges with large curvature. 

(2) The use of directional filters and directional derivatives when performed correctly does 
not give rise to the problems that forced Marr and Hildreth to reject such edge detection 
schemes (see section 4.1). The use of two directional filters with directional derivatives may 
be as efficient as the Marr-Hildreth scheme, with the advantage of not introducing spurious 
edges that appear with rotational filtering because of the closure property of z.c. contours 
(see section 5). 


7.4, Haralick 

Haralick (1980, 1981, 1982) has proposed a scheme for edge detection in which a pixel is 
marked as a step edge pixel if, in its neighborhood, there is a zero-crossing of the second 
directional derivative taken in the direction of the gradient. Haralick, in order to evaluate 
the derivatives he approximates, interpolates the sampled intensity values with discrete 
Chebychev polynomials. There is no explicit mention of a filtering step. Canny (1983), 
however, has shown that the above procedure is practically equivalent to using a filtering 
step (in our terms, a regularization step) before differentiation. The type of equivalent filter 
depends on the set of approximating functions and on the degree of differentiation required. 

7.5. Canny 

Canny (1983) has investigated the desirable properties of an optimal edge detector, based 
on efficiency of detection and reliability in localization. We have already seen that detection 
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of an ideal step edge is favored by broad filters while localization is favored by small filters. 
Canny has shown through variational methods that the optimal odd filter f op {x) (according 
to his criteria) in the 1-D case is the linear combination of four exponentials. 

-x 2 

Interestingly x) is very close to , which is the optimal odd filter from the point of 
view of minimal uncertainty. The treatment of Canny may also be seen as a well-founded 
justification for the use of filters with minimal uncertainty, because simply by first changing 
some constraints in his variational approach it is possible to obtain the second Hermite 
function. 2 Canny’s procedure for finding two-dimensional step edges and other types of 
edges uses directional operators of varying width, length and orientation. This procedure, 
which includes as an essential part an appropriate thresholding, works remarkably well 
on real images. His justification of the choice of directional operators is not completely 
satisfactory. Indeed: 

(1) For 2-D images, Canny uses two alternative differential operators, either ^ (see section 
4.2 and Havens and Strickwerda, personal communication) or directional operators. The 
preference for directional operators originates from his one-dimensional treatment of the 
problem. The optimal filter is chosen to be an antisymmetric function, because it is designed 
to detect maxima. Therefore the corresponding 2-D operator is not rotationally invariant, 
suggesting the use of directional operators for 2-D images. The output of directional 
operators can be directly used in the adaptive threshold scheme used by Canny, offering 
advantages with respect to the symmetric operator 

(2) As already mentioned in section 4.1, to obtain all edges in a 2-D image it is sufficient 
to use only two different directional derivatives. The use of more than two orientations is 
useful only to increase the signal-to-noise ratio, but is not required for edge detection in a 
noise-free 2-D image. 

8. DISCUSSION 

We will now summarize the main points of our analysis of edge detection. 

A. The first step in edge detection, after sampling of the image, consists of a filtering 
stage followed by a differentiation stage. Filtering has the main function of regularizing 
the iil-posed nature of edge-detection and should be performed before the differentiation 
operation. Filtering for the purpose of multiresolution analysis should be performed after 
the differentiation operation, when nonlinear differential operators are used. 

B. To be physically realizable, digital filters should be represented with a good approximation 
by a finite sequence of samples of points. From this point of view, a Gaussian or the first 
linear prolate (<f>„(x,c)) function are practically equivalent. Filtering with prolate functions 
regularize “more" the image (the image becomes entire and band-limited), whereas using 
a Gaussian the image becomes only entire. The Gaussian filtering, however, has two 
advantages over prolate functions: 

(i) It does not create z.c. when the size of the filter is increased (see section 6). 

(ii) In 2-D, the Gaussian decomposes into the product of 1-D Gaussians: as a 
consequence, it is particularly easy to reduce drastically the amount of computations 
involved in its use. 

C. Filtering of the image with a rotationally symmetric filter insures with high probability the 
closure of z.c. contours (see section 5). Filtering the image with directional filters does not 
ensure closed z.c. contours. Localization, however, is more accurate. 

D. Several types of derivatives at different scales may be needed for detecting and labeling 
intensity changes under the most general conditions. In the differentiation step, directional 

"Recently Spacok and Brady have investigated split-gaussian filters similar to Canny's but with 
poorer signal-to-noise ratio and bettor localization. 
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derivatives in only two directions are necessary, when DD operators are used. When RID 
operators are used, ■ performs better then V 2 in localization, but ^ has the disadvantage 
of not commuting with the convolution. 

E. In order to characterize the types of intensity changes in the image in terms of the 
physical properties that have originated them, it is useful to have a set of hierarchical 
symbolic descriptions. The lowest symbolic description uses as a substrate the associated 
fingerprints of the image, containing the map of zero crossings and their slope at different 
scales, and provides a local labeling of edges still in terms of image data. The final symbolic 
description must label edges in terms of the properties of the physical surfaces that originate 
the intensity changes, and therefore as object boundaries, shadows, reflexes, changes in 
texture, specular reflections, etc. This final representation of the type of the primal sketch 
is obtained using high level knowledge and geometrical reasoning from lower symbolic 
descriptions. 

In later papers, we will evaluate performance of different filters and different operators in 
real images, and we will outline a theory of a symbolic description of edges. 
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1. Appendix: Differentiation through Taylor expansion 


In previous sections, we have seen that to safely perform differentiation, it is necessary to 
smooth the data by some appropriate (analog or digital) filtering. If this filtering has removed 
enough high frequencies, so that our filtered image is band-limited function, and by the 
Paley-Wiener theorem (Boas, 1954) is also an entire function (see Section 3), numerical 
differentiation can be performed in a computationally more efficient way through Taylor 
expansion. 

If f(x) is entire, than f(x) is also analytic and the Taylor series 

/(*) = h + (* - **)/; +... + +... [i] 

has an infinite radius of convergence. If we have 2 n + 1 sampled points from [2], we can 
obtain 2 n linear equations from which we can solve for f^\ j = 1,2, ...,2n. 

Three equidistant points give 


f'k — ^(/fc+l “ fk-l) 

Ik = -^(fk-i - 2/* + fk+i) 

With five equidistant points, we obtain 


I'k = j{Ik~2 ~ 8/ifc-l + $fk+ ~ fk-‘l) 

f'k — X2h ;, ( “ Ik -- 2 + 1 l>Ik- 1 — 30 Ik + 10/,M -- Ik + 2 ) 




When the performances of the numerical differentiation obtained through spline interpolation 
(eqs. 2.9 - 2.10) are compared with those obtained by Taylor expansion (equations 3-4 in 
this appendix), it turns out that the first method gives more accurate and consistent results 
with noisy data while the second method is more efficient with data that are already smooth. 
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/'"“s 2. Appendix: Sampling 

Since image processing is performed in terms of discrete representations of signals and 
filters, it is important that manipulations of sampled images and filters have a meaningful 
connection with the original image /(.r) and the analytic form of the filter f(x). More precisely, 
for linear filtering, if /, is a discrete sequence of points of I{x), and /,• is another discrete 
sequence of points of /(*), the discrete convolution 

s< = 53 /fc '^- < l 1 ! 

k 

should be related to the exact convolution 

y{ x ) = J i{y)f{y - x ) d v = /(*) * /(*)• [ 2 ] 

This relation is clarified by standard results (Oppenheim and Johnson, 1972; Borsellino and 
Poggio, 1973). 

Suppose that we may represent l(x) and f(x) as 



i{' x ) = 

i 

[3a] 

/’“‘N 

/{*) = 

i 

[36] 


where </>,•(*) = sinc[|(x - *'/*.)]. Then 



l(x) * f(x) = g(x) = 



where g, = E* hfk-i- 

Thus from the discrete convolution of the sampled values [2], it is possible to recover 
completely, the exact convolution of the original image with the filter. It is now possible to 
represent a signal in the form of equation [1] when the signal is band-limited and correctly 
sampled. If one uses band-limited filters it is possible to obtain the required representation 
[3b] for the filter. For an arbitrarily sampled image, however, it is difficult to obtain the 
required representation [3a]. Indeed it would be necessary to sample the image according 
to the cutoff frequency of the optical system used in the imaging process, which is generally 
too high to be of practical use. This is related to the classical problem of aliasing. The 
simplest way to obtain a reasonable solution to the problem is to initially filter the image 
with an appropriate band-limited filter, before any further operation. 


2 




Torre, Poggio 


On Edge Detection 


/•N 


3. Appendix: The geodesic curvature 


It may be of some interest to ask whether the line on the surface A' (x, y, f(x, ?/)), whose 
tangent is always in the direction of the gradient is also a geodesic. A geodesic is aline 
whose geodesic curvature A'„ is always zero. In what follows we will briefly answer this 
question The surface A' has the first fundamental form [4.6]. 

As shown later, the geodesic curvature of a curve on surface X whose tangent is always 
in the direction of the gradient is 


K„ 


9 //(/f 0 + ffttvv 


j <7 + 7 j[l + /;(/! + /£)]} 


3/2 • 


[t] 


Now, K g = 0, when f \ — f 2 y and f xx — f yy , that is, for surface of revolution. Therefore the 
line on surface A', which is always directed along the gradient, has the following properties: 

(1) its normal curvature K n is equal to x 

(2) it is a geodesic, only for surface of resolution. 

Let us now compute K g . Given that the coefficients of the first fundamental form are 




E=l+fl F = f x fy G = l + f 2 y , 
The associated Cristoffel symbols F{, Pf x satisfy the equations 


V^EF + ViU = -FE x 


r \ X FF + Pf, EG = F X E - i EEy. 


[ 2 ] 

[»] 


or 


I,i 


2 [\{FE X + EE y )~ F X E 


F 2 - EG 


[4] 


, 10 X i{ri?-Kv)-r* 


= 2~E 


*'-¥ 


Finally, 




I'n 


[{fsfy 2/,/*x + (I 4 flWxhv) - (I 4- /?)(/**/„ + /*/*„) 

m n n nn 

Z (l_+ fDf rr fy hify 

r r 


I Aj ^ 2 f’ _ _ fxzfy. f{fy_ 

' 2 E ~ 11 E “ I • f 2 >* I 4 /; 

/*/„(■ 1 /;-i /;). Ilf, 

'.(i ‘ f'ih 2 

Ur, 


[«] 
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Similarly, I'J 2 l'j 2 satisfy the equations 


rj,h'F + i 1 2 F 2 


2™’ 


[ 8 ] 


v' i2 ff + r'f,/^ = -UGM)- 


Some algebra gives from [9] and [ 10 ] 


/x/x 


[ 9 ] 


[ 10 ] 


P 2 __ fy fxy r 11 ] 

1,2- g2 • [HJ 

Now suppose that X = (x,y,f(x,y)) is a regular parametrization of our surface around P, 
and let x — x{t),y = y(t) be the equation of the curve 7 in a neighborhood of P. Then 
(Pogorelov, 1965) 


K a 


s/PG - F 2 


{Ex ’ 2 + 2 Fx'y' + Gy' 2 ) 


3/2 


-{x'W-y H x' + Ay'- Bx'} 


[ 12 ] 


is the geodesic curvature of 7 in P, where 

A = Y\\x' 2 +2F\ 2 x'y' + V\ 2 v'* (*>«) 


[ 13 a] 


n = r 2 n x'° + 2 v‘i 2 x , y ' + r \, y ’\ [136] 

If we want to compute the geodesic curvature of a curve 7 which lies on the surface X 
and always has the tangent vector in the direction of the gradient of f{x,y), we must find 
the equation of the curve (x(t),y(t.)). Let us suppose 


x(t) = t 


[ 11 ] 


and therefore 


dy = f v (x(t ), y(t)) , , 

dt IMt),v{t)Y 1 J 

Now equation [15] defines a differential equation in y = y(t) whose solution completes the 
equation of the curve. We have x(t) — x and y(t) such that 


x'{t) = I x"{t) 0 



?/"(/) 


7, v h 


+ t(/,, - 5,A 

j x 


[18] 


Now let us compute 
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x"y' - ?/V + Mi - IM = - l ff ( 1 - Ji ] + 5 f{iyy ~ hr ) 


Now 


9 l f. 
1 


+ 4ih hhx + 2 hhyfy- + hh ' y 


fx 


n 

ivy i-a 

J X 


r 


/ f 2 

fyhx + 2 fyfxy ~jT~ + fyfyy~f .J 

y x y 2 


(/I ‘W 2 ^ /ii)+ ? 2{()} - 


\//';G - F 2 = g 


and 


[17] 






Ljx / 2 + 2 Fx'y' + G'l 


C 1 + fl) + ‘ZfxfyJ- + (l + /y)-p 

y x j x 


— 1 + /x + 2/ 2 + -TJ + yf 

y x y x 

— + /l + 2 /i/t + 1 + /y} 

y x 

= 7^{/i0 + /y) + fx + l /y(/x + fl) + !}• 

y x 


Finally, we have 


5 ‘ /?(/i I ) + /|(/yy /**) 

+ n) + 1 + / 2 / 2 +/<}}*/* 

Tf(?f — 0 + 7 i(hv ~ f' xx ^ 

Jg+jA' + flifl + flYm' 


The geodesic curvature K g is zero when 


fxy 

fx 



+ -F(f . 
r j > \Jyy 


fxx) = 0. 


Eq. [20] is not generally satisfied; it is satisfied for surfaces such that 


f 2 / 2 

y x J y 
fyy ~ fxx 

as a sphere or a surface of revolution. 


[18] 


[19] 


[ 20 ] 


(H) 
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4. Appendix: Approximation and interpolation 


The traditional procedure for performing numerical differentiation of sampled functions 
is to interpolate using polynomials or related functions and to analytically differentiate 
the interpolated function. This procedure is usually justified if the interpolated function 
converges uniformly to the original function as the number of samples increases. Most 
of the classical results in the interpolation and approximation of functions deal with this 
problem. In the case of images the main problem is however different: approximation for 
the purpose of differentiation has to be robust against noise. The solution to this problem 
has to be sought in regularization theory, as we explained earlier. In this appendix, we 
discuss some of the classical results on interpolation and approximation for completeness 
and not because they are directly relevant to the problem of regularizing edged detection. 
The reader will notice, however, that there are several connections between the classical 
results outlined here and our approach described in the the main body of this paper. 

Uniform convergence for polynomial interpolation is not guaranteed even when the original 
function is C 00 , or when it is analytical. Uniform convergence on bounded sets requires the 
original function to be entire. From the Paley-Wiener theorem (Boas, 1954) we know that 
this is the case with band-limited functions. 

If the original function is analytic, numerical differentiation may be performed using an 
appropriate Taylor expansion without interpolation. Interestingly, almost every function is 
made analytic by filtering with a Gaussian, since the filtered function is a solution of the 
heat equation (Widder, 1975). Thus, in order to safely perform numerical differentiation, it 
is necessary to use band-limited or Gaussian filters. 

Note that if approximation (in the Weierstrass-Bernstein sense) rather than interpolation is 
used, we obtain uniform convergence for all continuous bounded functions on bounded 
sets. Therefore, differentiation through approximation is successful on bounded sets for all 
bounded C' functions. We will see, however, that convergence is too slow for this approach 
to be practical. 

In what follows, we will use a one-dimensional approach for the sake of simplicity, but all 
our conclusions and results can be easily extended to two dimensions. 


4.1. Interpolation and differentiation 


Consider a function f{x) defined in [a,6] and have a < x„ < x { < x 2 < ... < a* < ... < 
x n < b distinct points, and 


Ik = f(xk) t 1 ] 

the values of / at x k . It is well known that there exists a polynomial ;>„(*) of degree n such 
that for the given values x 0 < x x ... < x n takes the values yiuin ■ ■ •> y n (Davis, 1975). This 
polynomial is 


Pn(x) 


Vkfk{x), 

0 



where / J: (.r) are Lagrange polynomials 
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lk(x) 


_ (:r - x 0 )(x - x . .(x - x k ^ } ){x - x k+] ).. .(x - x n ) 

(x k - x 0 )(x k - x.t). ,.(x k - x k ^i)(x k - x k+l )...(x k - x n )* 


[3] 


From equation [2] and [3], we consider the best way to estimate the derivative of f{x) in 
xkjf'{xk) by computing 


Vn( x k) y ' ykl k i x ) x=x k - [ 4 ] 

0 

In order to use this procedure reliably we need to know that in some way p n (x) is a good 
approximation of f(x) outside the sampled points. We would like to know that by increasing 
the number n of sampled points x n in [a, 6], we have 

lim nMOO p„(x) = /(*). [ 5 ] 

This is equivalent to uniform convergence. At the beginning of the century, Bernstein (see 
P. Davis, 1975) proved that equidistant interpolation over |z| < 1 to the function y = |z| 
diverges for 0 < |*| < 1 ; that is, continuity of f{x) is not sufficient to ensure uniform 
convergence. Runge showed that even if f(x) is analytical in [«, 6 ] uniform convergence 
may fail (P. Davis, 1976). For the function f(x) = 1/(1 + z 2 ), Runge showed that if p n {x) 
interpolates /(*) at equidistant points in [-5,5], p n (x) converges to / only in |z| < 3.63... and 
diverges outside the interval. Although f(x ) is analytic in 5ft is not analytic in the complex 
plane G and the singular points ±i induce this divergence (see P. Davis, 1975). To obtain 
uniform convergence of p n (x) to f{x) in {a,b\, it is necessary for f(z) to be analytic in a 
subregion of the complex plane C containing the segment [a, 6 ] (see Theorem 4.3.1 of P. 
Davis, 1975). 

it may be useful to remember that polynomial interpolation is not optimal if the sampled 
points are equidistant. Polynomial interpolation is optimal if interpolation of f(x) in [ 0 , 1 ] 
of order n is carried at the zeros of the Chebychev polynomials T n {x). In general, the 
procedure of differentiation through interpolation with polynomial or related functions may 
badly fail when applied to arbitrary sampled functions. 


4.2. Differentiation of analytic-, entire- and band-limited functions 


If we know that f(x) is analytic in [a, 6] and we have no information about its behavior on 
the complex plane C we cannot safely use differentiation through interpolation. If f(x) is 
analytic in x k e [a, b], we have 


f(x) = f(x k ) + (s - x k )f {l \x k ) + ... + f [n) {x k ) + ... [ 6 ] 

in \x k - S , x k + tf]. If we have 2 n + 1 sampled points in \x k - 8,x k ^], from equation [6], we 
can obtain 2 » linear equations, from which we can solve for f U) (x k ),j — 1...2 n. In the case 
of three or five equidistant points we obtain the formulae shown in Appendix 1 . 

The main problem with this procedure of numerical differentiation is that we do not generally 
know the radius of convergence of the Taylor expansion [ 6 ] and, given an arbitrary sampled 
analytical function, we do not know how many points around x k fail in the convergence 
interval. When the class of original functions / is further restricted to entire functions, the 
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Taylor expansion [6] is valid in SR, that is, it has an infinite radius of convergence, and the 
above procedure can be carried out safely. 

Now let us suppose that f(x) is entire, /e/, 2 (SR), and 

limp^ocsup—= 8 [7] 

where 


M { P ) = max w=p |/(4| 


[ 8 ] 


with z e C and / extended to the complex plane. 

If <5 < oo then f(x) is said to be of exponential type 8 and by the Paley-Wiener Theorem 
(Boas, 1954; Achieser, 1956), f(x) is band limited with F(w) = 0 for \J\ > 8, where F(u>) 
is the Fourier transform of f{x). The Paley-Wiener Theorem represents the connection 
between entire and band limited functions. It may be useful to remember that the Gaussian 
is entire and belongs to L"{ SR) but falls off just too quickly to be of finite exponential type 
and therefore to be band-limited. 

If f{x) is band-limited (with cutoff frequency uj n ), and f(x) has been correctly sampled at 
equidistant points spaced h, the Shannon Sampling Theorem gives us 


j-oo 

/(4 = /< sinc 

—oc, 




where sines = In this case, we can compute f’ k exactly from [9] as 


[9] 


r , 1 r cos n(k — i) 

fk ~h 2 l > ■'*' ~i — 


-OO i}ik 

{fk + l ~ fk-l ) - ~( fk+2 ~ fk-2 ) + -(/fc + 3 “ fk-3 ) 


[10 <] 


Although [10] is exact for correctly sampled band-limited functions, it converges rather 
slowly. 


4.3. Approximation and Differentiation 


It is well known that if f(x) is continuous in [«,fc], for every given e > 0, we can find a 
polynomial p„(x) of sufficiently high degree such that 

1/(4 - P/I (41 < <- 11 < x < b. [18] 

This is the so-called Weierstrass approximation theorem. Now let us suppose that we have 
an f(x) continuous and bounded in [0, i] and we know the values of f(x) at equidistant 
points x k £. Instead of interpolating a polynomial through the known points, we can 
construct the Bernstein polynomial: 



Torre, Poggio 


On Edge Detection 


*»(*)= E/(*o(”)* fc (i-*r- fc - in] 

k~ 0 

Observe that /?„(0) = /(0) and B n (l) = /(t), but apart from 0 and 1 B n (x) is not in general 
equal to f{x k ). A fundamental theorem of Berstein shows that 


Iim„_oo Bn(*) = f(x) in 

Moreover, if f(x) e C 1 , we have 

[0,1], 

[13] 

lirriyi^oo-D^x) = / (x) in 

[0,1]. 

[14] 


Thus, Bernstein polynomials provide simultaneous approximation of the function and its 
derivatives. If we want to obtain an estimation of the derivative of a sampled function, we 
can construct the Bernstein polynomial from the sampled values, compute the analytical 
derivative of B n (x) and take it as an estimate of f'{x). The drawback of Bernstein polynomials 
is that their convergence is very slow as shown by the poor performance of a 3- or 5- point 
approximation. Therefore, if we have samples of a generic function of class C 1 , we can 
use Bernstein polynomials to obtain an estimation of the values of derivatives at sampled 
points, but many points are needed for a good estimate. If we have samples of an entire 
function, the most efficient procedure is to use the equations derived in Appendix 1. 




